Skip to content

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Jan 12, 2026

Summary

  • Remove the build queue - builds now start immediately when created
  • Delete lib/builds/queue.go and lib/builds/queue_test.go
  • Add ErrResourcesExhausted error and 503 response type for resource exhaustion
  • Update OpenAPI spec with 503 response and deprecated queue_position field

Motivation

The build queue added complexity without providing significant benefits:

  • Builder VMs use similar resources to regular VMs
  • Queueing delays builds when resources may be available
  • Simpler architecture without queue state management

Key Changes

Aspect Before After
Build start Queued, starts when slot available Starts immediately
Capacity exceeded Queued with position Build fails with resource error
Cancel queued Removes from queue N/A (no queue)
Recovery on restart Re-enqueue pending Start pending immediately

Files Changed

  • Deleted: lib/builds/queue.go, lib/builds/queue_test.go
  • Modified: manager.go, types.go, errors.go, metrics.go, storage.go
  • API: Added 503 response to openapi.yaml, updated cmd/api/api/builds.go
  • Tests: Updated manager_test.go to remove queue tests
  • Docs: Updated README.md to reflect new architecture

Test plan

  • go build ./... - project compiles
  • go test ./lib/builds/... - all builds package tests pass
  • Manual testing: create a build and verify it starts immediately with building status
  • Manual testing: verify build completion and logs streaming work correctly

Note

Removes the in-memory build queue so builds start immediately and adjusts API/behavior to match.

  • Manager: drop BuildQueue; CreateBuild sets status building and runs build in background; CancelBuild now stops the builder instance; recovery restarts building/pushing builds; detect resource exhaustion on instance create and map to ErrResourcesExhausted.
  • API/OpenAPI: remove queued status; deprecate Build.queue_position (always null); POST /builds now "created and started"; add 503 resources_exhausted with Retry-After and wire in cmd/api/api/builds.go; regenerate OAPI client/server code.
  • Metrics: remove queue/active gauges; keep duration and total counters.
  • Storage/README/tests: update for immediate-start model; delete lib/builds/queue.go and its tests; adjust recovery to only building/pushing.
  • Build deps: promote github.com/opencontainers/go-digest to direct dep.

Written by Cursor Bugbot for commit 81d9976. This will update automatically on new commits. Configure here.

Replace the explicit build queue with immediate build execution.
Builds now start immediately when created instead of being queued.

Key changes:
- Delete queue.go and queue_test.go
- Remove StatusQueued - builds start with StatusBuilding
- Add ErrResourcesExhausted error for resource limit detection
- Add 503 response with Retry-After header to OpenAPI spec
- Update manager to start builds in goroutines directly
- Simplify CancelBuild to only handle running builds
- Update RecoverPendingBuilds to start builds immediately
- Remove queue-related metrics (queueLength, activeBuilds)
- Update tests and documentation

If host resources are exhausted during instance creation, the build
will fail with a clear error message. The API includes a 503 response
type with Retry-After header for future pre-check implementation.
@github-actions
Copy link

github-actions bot commented Jan 12, 2026

✱ Stainless preview builds

This PR will update the hypeman SDKs with the following commit message.

refactor: remove build queue for immediate build execution

Edit this comment to update it. It will appear in the SDK's changelogs.

hypeman-typescript studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hypeman-typescript/612efe04f67080ba83007f7f00eb597dd2432700/dist.tar.gz
hypeman-go studio · code · diff

Your SDK built successfully.
generate ⚠️lint ✅test ✅

go get github.com/stainless-sdks/hypeman-go@8989e16cb1a35a1063ebcdd2f0221f260374d765
hypeman-cli studio · conflict

⏳ These are partial results; builds are still running.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2026-01-12 16:14:37 UTC

@tembo
Copy link

tembo bot commented Jan 12, 2026

A couple things to consider with the queue removal:

  • lib/builds/manager.go: CreateBuild now always returns a build after spawning go m.runBuild(...). With that, ErrResourcesExhausted (detected in executeBuild) can’t propagate back to the caller, so the new 503 handling in cmd/api/api/builds.go likely never triggers. If the intent is “fail fast on create when resources are exhausted”, you may want a synchronous preflight/reservation step before returning the 202.

  • lib/builds/manager.go: the resource exhaustion detection via strings.Contains(err.Error(), "exceeds") && strings.Contains(..., "limit") feels pretty brittle and easy to break with wording changes. If instances.Manager has (or can expose) a typed/sentinel error for capacity/limit failures, that’d make this mapping much more reliable.

  • lib/builds/storage.go: listPendingBuilds no longer includes queued. If there are any existing builds on disk with status queued from previous versions, they’ll never get recovered and could be stuck indefinitely. A short-term backwards-compat path (treat "queued" as pending) or a migration step could help here.

  • lib/builds/manager.go: CancelBuild now deletes the builder instance but ignores the return error; might be worth at least logging failures (and/or using a short context.WithTimeout(context.Background(), ...) so cleanup isn’t skipped if the request context is already canceled).

  • openapi.yaml / generated lib/oapi/oapi.go: queue_position is marked deprecated, but generated code calls out no x-deprecated-reason. If you want nicer client docs, adding an explicit reason/extension in the spec would avoid that awkward generated comment.

Copy link
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if err != nil {
// Check if this is a resource exhaustion error
errStr := err.Error()
if strings.Contains(errStr, "exceeds") && strings.Contains(errStr, "limit") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear how this ErrResourcesExhausted error makes it back to the API handler - since executeBuild is called from runBuild which runs in a background goroutine, the caller has already received a 202 by the time this runs. the 503 path in the handler seems unreachable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right - that was exactly the issue. I've fixed this by adding a synchronous preflight resource check in CreateBuild that runs before spawning the async goroutine:

// Preflight check: verify resources are available before accepting the build
builderMemory := int64(policy.MemoryMB) * 1024 * 1024
if err := m.instanceManager.CheckResourceAvailability(ctx, policy.CPUs, builderMemory); err != nil {
    if errors.Is(err, instances.ErrResourcesExhausted) {
        return nil, fmt.Errorf("%w: %v", ErrResourcesExhausted, err)
    }
    return nil, fmt.Errorf("check resource availability: %w", err)
}

Added a new CheckResourceAvailability() method to instances.Manager that checks per-instance and aggregate limits without actually creating an instance. Now the 503 path is reachable when resources are truly exhausted at build creation time.

Also added proper sentinel errors (ErrResourcesExhausted) to both packages so we can use errors.Is() instead of brittle string matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants